Adapting the JIRS Passage Retrieval System to the Arabic Language
نویسندگان
چکیده
The need of having a Passage Retrieval (PR) system for Arabic texts is due essentially to our aim to build an Arabic Question Answering (QA) system in our research team. We have chosen working on the PR system to be our first step to pursue our aim because being the core component and its quality will affect directly the performance of the QA system. JAVA Information Retrieval System (JIRS) is a PR QA-oriented system, multi-platform, open source and free to use. JIRS uses an n-gram model and it is language-independent. It separates language configuration files to make easier its adaptation to any language. In this paper, we report the different challenges when adapting the JIRS to the Arabic language.In order to evaluate JIRS on Arabic, we had to develop an Arabic test-bed using the multilingual CLEF QA one as guideline. We also report the results obtained in our experiments where we retrieved Arabic passages with JIRS first without any text preprocessing and second performing a prior light-stemming on the documents of the test-bed. The preliminary results show that it is possible to obtain a first Arabic passage retrieval system adapting JIRS on pre-processed text with a light-stemmer.
منابع مشابه
Re-ranking of Yahoo Snippets with the JIRS Passage Retrieval System
Passage Retrieval (PR) systems are used as first step of the actual Question Answering (QA) systems. Usually, PR systems are traditional information retrieval systems which are not oriented to the specific problem of QA. In fact, these systems only search for the question keywords. JIRS Distance Density n-gram system is a QA-oriented PR system which has given good results in QA tasks when this ...
متن کاملStructure-Based Evaluation of an Arabic Semantic Query Expansion Using the JIRS Passage Retrieval System
متن کامل
TALP at GeoCLEF-2006: Experiments Using JIRS and Lucene with the ADL Feature Type Thesaurus
This paper describes our experiments in Geographical Information Retrieval (GIR) in the context of our participation in the GeoCLEF 2006 Monolingual English task. The TALPGeoIR system follows a similar architecture of the GeoTALP-IR system presented at GeoCLEF 2005 [2] with some changes in the Retrieval modes and the Geographical Knowledge Base. The system has four phases performed sequentially...
متن کاملN -Gram vs. Keyword-Based Passage Retrieval for Question Answering
In this paper we describe the participation of the Universidad Politécnica of Valencia to the 2006 edition, which was focused on the comparison between a Passage Retrieval engine (JIRS) specifically aimed to the Question Answering task and a standard, general use search engine such as Lucene. JIRS is based on n-grams, Lucene on keywords. We participated in three monolingual tasks: Spanish, Ital...
متن کاملBoosting Passage Retrieval through Reuse in Question Answering
Question Answering (QA) is an emerging important field in Information Retrieval. In a QA system the archive of previous questions asked from the system makes a collection full of useful factual nuggets. This paper makes an initial attempt to investigate the reuse of facts contained in the archive of previous questions to help and gain performance in answering future related factoid questions. I...
متن کامل